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We consider multiscale stochastic systems that are partially observed at 
crete points of the slow time scale. We introduce a particle filter that takes 
advantage of the multiscale structure of the system to efficiently approximate 
the optimal filter. 
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§J 1 Introduction 

We are interested in the problem of estimating a function of a multiscale process that 

can be approximated by a diffusion which lives in the slow scale, when it is partially 

■ observed. Such problems come up in many applications, such as molecular dynamics, 

climate modelling or estimation of stochastic volatility using agent-based models (see 

[8] for general discussion of multiscale models and [I] and [9] for applications to kinetic 

^ ! Monte-Carlo and climate modelling respectively). 

In this paper, we focus on the problem of estimating the slow component of a COntin- 
Cw ' 

uous multiscale process from partial and discrete observations of it. More specifically, 
we have an W +q process X e = (X^) t >o = (xf: 1 '^, xj: 2 '^) t > that satisfies the following 
multiscale stochastic differential equation: 

dX^ = a(xjM,xjM)dt + a x {X^\x^)dwP , . 

dxf^ = \b{X^\x^)dt + ^{X?*\x™)dW?> {) 

where Xf' e ' 6 W, xj: 2 '^ G R q and and are two independent Wiener 

processes in W and W 1 respectively. Let /i be the initial distribution, i.e. /i = C(Xq). 
We denote by /ii and /i 2 the marginals on an d Xq 2 '^ respectively. 
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We observe the process (X t e ) t > through Y e = (Y kA ) k= o,...,T, where A ~ O(l), i.e. 
the observations live in the the same scale as X\ l ' , which we call the slow time scale. 
In fact, let us assume for simplicity that A = 1. The process Y e is given by 

Y* = h(x£> e \v k ), (2) 

where (vk)k are i.i.d. random variables with known distribution. 

Our goal is to compute the conditional distribution of the slow process xf: 1 '^ given 
the observations, at observation points k = 1,...,T, or, equivalently, compute the 
expectations 

7ri(f):=E(f(xi 1 ^)\YU=l,...,k) (3) 

for all continuous bounded functions / on M p , i.e. / G Cb(M. p ), and k = 0, . . . ,T. 

In [2], the authors discuss this problem for an arbitrary diffusion (X t ) t > and they 
develop a particle filter that approximates the conditional distribution (also called 
optimal filter). The additional difficulty compared to discrete systems is that of 
simulating the process (X t ) t between the observation points, i.e. simulate Xr k+ i\ 
given X k . In j2], this is done by applying the Euler discretization scheme. The step 
size is given as a function of the number of particles used by the particle filter, chosen 
so as to optimize the convergence rate. An alternative approach has been recently 
suggested in [5]. 

In the case of multiscale diffusions, both the Euler discretization scheme and the 
MCMC method described in [5] become inefficient. However, if we are only interested 
in the slow scale marginal of the optimal filter given by Q, we can avoid these 
problems by replacing the multiscale diffusion by the approximation of the slow scale 
constructed by applying the averaging principle. When the averaged equation is not 
available in closed form, we construct a further approximation of its drift and variance 
using short simulations of the multiscale system (see [31 [7]). 

In section [21 we review some of the basic results of [2] for discretely and partially 
observed diffusions. In section [31 we describe the algorithm and analyze the approx- 
imation error in the case were the averaged equation is available in closed form. In 
section HJ we do the same for the case were the averaged equation is not available 
in closed form and we apply the heterogeneous multiscale method to approximate it. 
Finally, in section 4, we discuss how to extent this approach to continuous observation 
processes. 

2 Discretely and partially observed diffusions: a 
review 

Suppose that X = (X t )t>o, with X t G M, d , is a diffusion of the form 

dX t = a(X t )dt + a(X t )dW t , (4) 
with initial distribution \i = C(Xq). The diffusion process is observed through 

Y k = h(X kj v k ), (5) 
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where (v k ) k >o are i.i.d. random variables, such that the conditional probability admits 
a density g, i.e. F(Y k G dy\X k = x) = g(x,y)dy, and g is bounded and explicitly 
known. We want to approximate the optimal filter 

n k = F(X k \Y e ,£=l,...,k). (6) 

In [2], the authors approximate ([6]) using a combination of the Euler method and the 
discrete particle filter. The exact algorithm is as follows: 

• Initialization (k=0): Simulate N independent random variables (^)^ =1 from 
the initial distribution fi. 

• For k > 

1. Evolution: Simulate 



where p± (x, •) is the forward Euler approximation with step ^7= of the 
transition kernel 

p 1 (x,-) = F(X 1 G -\X = x) 



2. Resampling: Simulate N new random variables (£ k )j =1 from 



1 N j 

7 =1 l^i=l W k 



k -5, 



where the weights (w 3 k )jL 1 are the likelihood of observing Y k if X k = 
i.e. := 

Then, the particle filter n k = jt Y2i=i S f] converges weakly to the optimal filter 7r k 
defined in ([6]). More precisely, the following holds: 

Theorem 2.1 (Del Moral - Jacod - Protter, [2]). For all bounded Borel functions 
f G i3fc(lR d ), all k = 0, . . . ,T and all N > 0, the approximation error will be bounded 
by 

^K(f)-^(f)\<^=\\f\u (7) 

under the following assumptions 

1. The functions a(-) and a(-) are two times differentiable with bounded derivatives 
of all orders up to two. 

2. The covariance matrix is uniformly non degenerate, i.e. cra*(-) > i] > 0. 

The constant C k depends on the drift and variance of the diffusion, the likelihood 
function g and k. 
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If the likelihood function g(x,y) is bounded above and below, i.e. there exists a 
constant K such that < g(x, y) < K for all x and y, then the constant in 
theorem 12.11 takes the following form: 

C (? (8i^) fc+1 - 8K*r 
C k = (2 + 2a) 8R2T _ 1 , 

where a is such that 

,(JV) ■ - a 



sup|pi 70) -Pif(x)\ < 
x y JS 

with p^ and pi as above. So, a is the constant that appears in the upper bound of 
the error of the approximation of the distribution of Xi by the forward Euler method 
(see (Tj). Consequently, if we apply the Euler discretization method to the multiscale 
system (JIJ), the constant a will be of order a ~ O(-). 

Corollary 2.2. Under the assumptions of theorem \2.1\ if the diffusion process and 
the observations are of the form (TJJ) and respectively, then the error of the slow 
scale marginal of the particle filter described above becomes 



E 



7T 



e,N , 
k 



(f)~<(f) <-%\\f\U (8) 



The above corollary shows that if the diffusion process that we want to estimate 
is a multiscale diffusion, the particle filter described in [2] will no longer be efficient, 
just as the Euler discretization method will not be efficient. 



3 The multiscale case 

Since the observations live in the slow scale, we can only hope to get a good approx- 
imation of the slow scale marginal of the optimal filter and, consequently, we focus 
on the approximation of tt|(/) given by A quite natural thing to do in order to 
avoid simulating the whole multiscale process - which, as we have already seen, is 
problematic - is to try and replace the slow scale process (Xl' e ) t by a diffusion (X t ) t 
in MP that does not depend on the fast scale process (X^ ,e ) t . This is, indeed, possible 
under the following assumption: 3A > such that Vxi G MP and Vx2, x' 2 G MP, 

< x 2 - % , 2 ,b(x 1 ,x 2 ) - fr^i,^) > +\\a 2 (x 1 ,x 2 ) - a 2 (x 1 ,x' 2 )\\ 2 < —X\x 2 - x' 2 \ 2 , (9) 

where <•,•>, | • | and || • || denote the Euclidean inner product and norm in MP and 
the Frobenius norm in M q x MP, respectively. In other words, we require both b and 
a to grow sublinearly. This assumption implies that if we fix Xl ,e = x\ in (pQ), X 2,e 
converges to its unique invariant distribution v xi exponentially fast, with rate -. In 
fact, the necessary assumption is not dHJ) but this exponential ergodicity property. 
We approximate the process (Xl' € ) t by the diffusion process (X t ) t satisfying 



dX t = a(X t )dt + a{X t )dW t , X ~ m = C(X^), 



(10) 
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where 

a(x) — I a(x, z)v x (dz) (11) 



and 

a{x) = ( / a^zfv^dz) ) (12) 



From now on, let us assume that the assumptions of theorem 12. II and (Q hold. Then, 
it is a well-known result, often referred to as the averaging principle, that X^' e — > X t 
as e — > 0. More specifically, the following holds (see [6]): 

sup \Ef{X}*) - Ef(X t )\ < C f , T • e, V/ e C 6 (R P ). (13) 

0<i<T 

This estimate suggests that we can approximate 7T^(/) given by ((31) by 7ffe(/) defined 
by 

7f fc (/) := E (/(X fc ) | = Yl, I = 1, . . . , k) , (14) 

where Yf. = h(Xk,Vk) and (vk)k are i-i.d. random variables as in (j2J). Indeed, it is a 
straight forward consequence of (|13p and Proposition 2.1 of [2] that 

-**(/)! <Cie||/lloe. 

If we cannot compute 7^ explicitly, we approximate it by the particle filter 7f^ de- 
scribed in section [2j Then, the total error will be bounded by 

E k(/) - 7f(/)| < C f e + ^=1 ll/lloo. (15) 



Thus, if e is small, it is much more efficient to approximate n e k by n k rather than 
TTfc i n ©, i.e. if we are willing to accept an approximation error of order 5, we will, 
in general, achieve this with a much smaller number of simulations (and computing 
time) if we compute rather than 7r^ N . 



4 Approximating the averaged equation 

In the previous section, we argued that it is, in general, more efficient to approxi- 
mate the slow marginal of the optimal filter 7r^ by replacing the slow component of 
multiscale diffusion by another diffusion, which we call averaged diffusion, and then 
applying the particle filter algorithm, rather than applying it directly to the multiscale 
diffusion. However, in order to simulate the averaged diffusion (X t )t that replaces the 
slow scale process (Xl' e ) t , we need to know its drift and its standard deviation given 
by (TTTj) and (fT2"|) respectively. In most cases, these are not going to be explicitly 
known. Then, we replace (fTTj) and (Tl2|) by their Monte Carlo estimates, as in [3]. 

First, we define a new family of diffusion processes as follows. For each x G M. p , 
we define the process Z t (x) as the solution of the following stochastic differential 
equation: 

dZ t {x) = b(x, Z t (x))dt + a 2 {x, Z t (x))dV t , Z (x) ~ /x 2 , (16) 
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where Vt is an IR 9 -valued Wiener process. We also define a new approximation to the 
transition kernel pi(x, •), where pt(x, •) := P [X t G -\Xq — x), so that we can simulate 
from it exactly, as follows: 

1. For k = 0: 

Simulate M independent random variables (Q n )iLi from the forward Euler 
approximation to the distribution of Z n (x) defined in (TIB]) , with step St and 
initial distribution /x 2 - Simulate £i from 



6 ~ Gsn I A* a ( x ' Cfc,«)) j . At E 

where we denote by Gsn(/i, r 2 ) the Gaussian distribution with mean \i and 
variance r 2 . Note that we implicitly assume that q = 1, in order to simplify 
notation. 

2. Ebr* = l,...,|_2bJ -1: 

For all i = 1, . . . , M, set C k0 = Cl-i n aric ^ simulate Q l k n fr° m the forward Euler 
approximation to the transition kernel P (Z n (£fc) £ "l^o(Cifc) = Co) with step <5t. 
Then, simulate from 



-i \2 



~ Gsn 



3. ForA;= |_£J : 

As in the previous step, set Cfc o = Cfc-i n anc ^ simulate Q. n from the forward 
Euler approximation to the transition kernel P(Z n (£ fc ) G -|Z (^) = (| ) with 
step <5£. Then, simulate X\ from 

*i ~ Gsn ^(1 - kAt) £ a (&, QJ) j , (1 - *At) ^ £ £ J 2 j j 

One can extent the weak convergence theorem in [3] for <j\ ^ 0, to get an estimate 
of the approximation error of the transition kernel. More specifically, 

*t + 6t+ ^—-r^(At + At 2 ) + — J (17) 

Let us now define a new particle filter, similar to the one in section [2J only the 
evolution of the particles between observation points follows the algorithm above, for 
5t = At = and n = M = 1. The choice M = 1 might seem surprising at first, but 
actually gives optimal bounds (see [3], section 2.4). The reason is that the Monte- 
Carlo estimation is done by averaging both in time and independent realizations but 
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averaging in time also improves the initialization error. So, it is in theory preferable 
to average one long path rather than many short ones. 

Let us name this new particle filter . Notice that for these values of At, St, M, n, 
ffTTj) becomes 

Ie/cxo-e/cxoi^c,-^ 

Then, the total approximation error becomes 

E H(f) - *F(J)\ < C (e + -^j \\f\U (18) 

To study the efficiency of this particle filter, suppose that we want to achieve a total 
error of order 0(e). Then, if we apply the particle filter algorithm of section [2] to the 
multiscale system, the number of simulations needed will be of order 0(-^(p + q)): at 
each step, we simulate NyN (p+q) random variables - we need y/~N (p+q) simulations 
for the evolution of each particle and we have N particles - and we need N ~ 0(\), 
since the total error is given by (jSJ). 

On the other hand, if we approximate the optimal filter by 7r^, we need N ~ O(-s) 
to get a total error of order 0(e). For this particle filter, the number of random 
variables we simulate at each step is N(\fNp + Nq) - N is the number of particles 
and we simulate y/Np+Nq random variables for the evolution of each particle. Notice 
that since M = 1, we estimate the drift and variance of (fIU|) using the final value of 
only one path of the appropriate process Z t (-). The reason why we discard the rest of 
the path is to allow the distribution oi Z t (-) to get close to the invariant distribution 
of the process. Consequently, we need a total of 0(^p + -^q) simulations, which 
shows that we can, indeed, achieve substantial improvement in the efficiency of the 
algorithm by replacing the multiscale system by the averaged diffusion, even when 
this not explicitly known. 

Remark 4.1. In order to approximate the drift and variance of the averaged process, 
we need to be able to simulate random variables from the invariant distributions v x , 
for the appropriate x. We do that by simulating the process Z t (x), whose distribution 
converges exponentially fast to the invariant distribution v x . Notice, however, that 
if x and x' are close, the distributions v x and v x i will also be close as a result of 
the smoothness of the drift and variance. Thus, we can improve the efficiency of the 
algorithm further by correlating the simulations of Z t (x) and Z t (x') as in flTjj / or by 
using the simulations of one process to initialize the other. 

5 Conclusions 

This analysis can also be applied for more general observation processes. For example, 
suppose that we observe (Y£)1 =1 , where is the solution of the following SDE: 

dY t * = h{X^\x^\Y:)dt + r{X^\x^\Y:)dV t , Y = 0. (19) 
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Then, we can replace ( JT9l) by its averaged approximation 

dY t = h(X t ,Y t )dt + f(X t ,Y t )dV t , Y = 0, (20) 

for 

h= h(x,z,y)v x (dz) and f=[ r(x, z,y) 2 u x (dz) 

Notice that, by the averaging principle, (X^ ,Y t e ) — > (X t , 1^), as e — >■ 0. Then, we 
can apply the particle filter described in [2] for this type of observation process and 
approximate -k\ by 

7r k =F(X k \Y e = Y^e = l,...,k). 

We expect that the efficiency of the algorithm will also be improved in this case. 

In this paper, we introduced a particle filter for the estimation of a quantity (Xj: ) 
that can be approximated by a diffusion given discrete and partial observations of it, 
in the case where this quantity is the slow component of a multiscale diffusion of the 
form (pQ). The main idea is that rather than evolving the particles by simulating the 
full multiscale system which can be very inefficient, it is better to do a short runs of 
of the full multiscale system and use these simulations to locally estimate the drift 
and variance of the diffusion that approximates the evolution of the partially observed 
quantity. Depending on the multiscale system and the approximate diffusion, one can 
use different methods for the estimation of the diffusion parameters and the evolution 
of the particles that follow the diffusion, rather than the Monte-Carlo estimation and 
the Euler simulation discussed above. 
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